Chapter 12 · Databases

Caching

Understanding the mechanism that makes high-performance backend systems fast — from Google Search to Netflix to Redis.


01 · DEFINITION

What is Caching?

Caching is a mechanism using which we decrease the amount of time and effort it takes to perform some work.

That one line is the entire concept summarised. But let's unpack it properly because the implications run deep throughout all of backend engineering.

The Two-Part Definition

There are two ways to understand caching — a plain English version and a technical one. Both say exactly the same thing, just at different levels of precision:

Plain English Definition

Caching is a mechanism using which we decrease the amount of time and effort it takes to perform some amount of work. That is the on-line explanation of what exactly is caching.

Technical Definition

Caching is keeping a subset of some data — let's say we have a primary data source — when we keep a subset (not the whole data, a subset) of that data, depending on the uses of the data, the frequency of uses, the probability of the next use, time, etc. — depending on a lot of parameters — we keep that subset in a location which is faster to access, which takes less time, and also takes less effort. So technically speaking, caching is a mechanism using which we can decrease the amount of time and the effort it takes to retrieve or to do some kind of operation.

Why "Subset" is the Most Important Word

The definition says subset — not all the data, not a copy of everything, but a carefully chosen portion of the primary data. This is critical for two reasons:

Parameters That Govern What to Cache

When designing a caching layer, these are the parameters you evaluate to decide what deserves to be in the cache:

Why Caching Matters — The High Performance Context

This single mechanism — caching — is a huge factor in a lot of high performance applications. When we say high performance, we mean applications that track latency in two-digit microseconds or milliseconds. At that scale, even a 5ms difference in response time is noticeable, and a 50ms difference is unacceptable.

Without caching, high-traffic systems face two impossible bottlenecks:

The Two Core Scenarios Where Caching is Essential

1. Heavy computation — When generating a result requires significant CPU, GPU, or memory resources (ML inference, complex joins across millions of rows, aggregation pipelines). You don't want to redo this for every single user request.

2. Heavy data transfer — When the data being sent is large (video files, image libraries, large JSON payloads) and sending it over the network for every request would be slow and expensive. You want that data to already be close to the user.

These two scenarios — avoid expensive recomputation and avoid redundant heavy data transfer — are the patterns you'll recognise in every single caching use case you encounter in your career. Whenever you see either of these two situations, your first instinct should be: "Can we cache this?"


02 · REAL-WORLD EXAMPLES

Real-World Examples

Before diving into the mechanics, it helps to understand why caching matters at production scale. Let's go through three examples that illustrate the concept and the two core scenarios where caching always shows up. After these examples, you'll start to see the pattern everywhere.

Example 1 — Google Search

Pretty much all of us use or have used Google Search in our browsers. What exactly happens when you type something into the Google search bar and hit Enter?

That query is processed by Google's search engine through a pretty complex algorithm and workflow. Every query goes through a pipeline that typically involves:

This whole process is computationally expensive — when we say expensive, we mean computationally expensive. It takes a lot of computing power, a lot of CPU, a lot of memory resources, etc.

Now consider a query like "what is the weather today" — queries like this are searched millions and millions of times every day. Without caching, without implementing this mechanism called caching, Google's servers would need to recompute all the results for every single query. Every single query involving the current weather of a location would require going through all the index, running all the ranking algorithms, and fetching the results — which would in turn significantly slow down the response times and lead to very high server load.

What Google Does Instead

Google uses a distributed in-memory caching system to store the results. The key word here is distributed — the servers of the caching system are spread across the whole world. They are not just concentrated in a single location but spread all over the world, and those cache servers store the results — whatever results are returned by all those ranking algorithms and different algorithms involved in the whole Google Search workflow. They get cached or stored in these servers.

When a user searches, the system first checks whether the results of that particular query are present in the cache or not:

Google Search cache flow with detailed steps User Query "weather in Delhi" Distributed Cache in-memory store global servers CACHE HIT ✓ — found it! Return instantly very fast retrieval CACHE MISS ✗ — not found Full pipeline runs crawl → rank → results store in cache
Fig 1.1 — Google Search: Cache Hit vs Cache Miss flow with result storage loop

Let's trace through both paths precisely:

Key Vocabulary Introduced

Cache Hit — When you look for data in the cache and find it. Fast path. No recomputation needed.

Cache Miss — When you look for data in the cache and it's not there. Must go to the primary source, compute the result, then store it for future use.

Example 2 — Netflix & CDN

Netflix is a huge and global streaming platform which delivers different kinds of content — movies, series, anime, etc. — to millions of users all over the world. It streams large volumes of data — and when we say large, it can be multiple terabytes — because of the way these streaming platforms work.

How Netflix Stores a Single Movie

For a single video — let's say a single movie called Movie One — it goes through a process called encoding. It prepares different resolutions for different devices and different network speeds. For a high level explanation, let's say it has:

Depending on your network speed and which device you're using, Netflix dynamically sends an optimised version of that content so that you don't waste your bandwidth and the load on Netflix's servers also decreases. That's all about encoding. But the real question is: how does Netflix actually deliver hundreds and thousands of terabytes of data to millions of users spread across the whole world, with minimal buffering?

The Answer: CDN (Content Delivery Network)

Netflix has its own originating servers — let's say they're somewhere in the US, in different data centers and locations, with server racks that store the actual movies. But Netflix goes an extra mile. All over the world, at different locations, Netflix places what are called Edge Locations.

These are called Edge Locations because these servers are strategically placed so that the latency for users in that region is minimal. If there is a server in India, then for Indian users, the latency of data requests served from that server is going to be minimal — as compared to all requests going through the originating server which is situated in the US.

Think about what happens without this: all people in India who want to watch a movie on Netflix would be sending requests all the way to the US data center. Geographically speaking, that's a long distance for data to travel. The response time would be high — you'd experience buffering. But with an edge server in Mumbai or Chennai, that distance collapses to near zero.

Netflix CDN architecture diagram Origin Server (US) actual movies stored here Netflix US Data Center Edge · Europe PoP — cached Frankfurt/London Edge · India PoP — cached Mumbai/Chennai Edge · SE Asia PoP — cached Singapore/Tokyo Edge · Americas PoP — cached NY/LA/São Paulo EU users IN users SEA users US users subset of content cached at edge locations — TTL based, ML-selected
Fig 1.2 — Netflix CDN: Content distributed from origin to Edge/PoP servers globally, served to nearby users

Key terms from this example — these are important vocabulary you'll use throughout your career:

Netflix's Smart Caching — Not Everything Gets Cached Everywhere

Netflix does not cache all its data in all the edge locations. That would incur enormous cost and also require a lot of resources. Instead, they use machine learning algorithms, trend analysis, real-time regional data, and a lot of other complex computations to decide what subset of data to cache at each specific edge location. A server in Mumbai might cache popular Bollywood films and trending Indian shows, while a server in Tokyo caches anime and J-dramas. This is smart, regional, data-driven caching — not brute-force replication.

CDN is not only for video streaming. Platforms like Vercel use the exact same strategy to serve static web assets — JavaScript bundles, HTML files, CSS, images — from the edge closest to the requesting user. When you deploy a Next.js app on Vercel, it's automatically distributed to their global edge network. That's why Vercel deployments load almost instantaneously regardless of where in the world you are. Same principle, same mechanism, different type of content.

Example 3 — Twitter / X (Trending Topics)

Let's take a platform called X, previously known as Twitter. You can apply this example to any social media platform — Facebook, LinkedIn, YouTube — they all implement the same kind of strategy.

If you're familiar with how Twitter works, it has a section called Trending Topics. What Twitter does is it identifies trending topics by analyzing millions and billions of tweets in real time. It analyzes all the tweets that people are making all over the world, extracts patterns and trends, and calculates what is trending.

Why This Computation is Expensive

This calculation is very expensive. It involves:

What Happens Without Caching

Imagine if Twitter did all this calculation every time some user went to the trending section. If even half of the billions of Twitter users are trying to access the trending section, and every user triggers this expensive calculation, the server cannot handle that — it would crash in minutes or seconds. There are billions of people, and if every single request triggers the entire ML pipeline, that's impossible to sustain.

What Twitter Actually Does

To avoid doing this heavy computation for each request, Twitter caches the trending topics. Every few minutes — and of course we don't know the exact algorithm or exact duration that Twitter uses — but taking a rough estimation for the sake of this example: every few minutes, Twitter takes all this data from different regions, executes different machine learning algorithms and trend detection algorithms on a very high level, and then stores the results in an in-memory key-value store like Redis.

When users request the trending section, instead of computing it all again, it just takes that data from the cache and sends it to the user. That is the reason the moment you open your phone you get that data instantly — you do not see any kind of significant loading time. If you have a generally fast internet connection, the whole UI interaction is very fast.

Why Trending Topics is Safe to Cache

For a trend to change in a particular region — let's say your country has ongoing elections, and elections are the trending topic — that's not something which is subject to change in seconds or minutes. At the very least it will stay in the trending section for a couple of hours or a couple of days. Since this data is not dynamically changed on a minute-by-minute basis, it is very safe to cache it. The TTL can be set to a few minutes or even longer without any noticeable loss of freshness for the user.

The Pattern — Recognising When to Cache

After those three examples, you should now see the pattern clearly. Every time there is a situation where it's either about:

These are the two common scenarios when caching comes into play. Recognise either of these in a system design discussion, and caching is almost always part of the solution.


03 · TAXONOMY

Levels of Caching

Caching exists at multiple levels of a computer system. As a backend engineer, you'll encounter three levels most frequently:

Three levels of caching 1. Network Level CDN · DNS 2. Hardware Level L1 · L2 · L3 Cache · RAM 3. Software Level Redis · Memcached · ElastiCache
Fig 2.1 — The three primary levels of caching a backend engineer encounters
Note on "Software-based" caching

"Software-based" doesn't mean purely software. Redis uses the hardware's RAM. It's called software-based because you interact with it via a library or API — the means of interaction is software. The performance still comes from the underlying hardware (RAM).


04 · NETWORK LEVEL

Network Level Caching

The two major use cases at the network layer that backend engineers deal with are CDN and DNS caching.

4.1 — CDN (Content Delivery Network)

The core idea of CDN is to cache content on servers geographically closer to the end users. Any server placed close to the user at the "edge" of the network is called an Edge Node, Edge Server, or Edge Computing.

How a CDN request flows

CDN request flow step by step 1. User Browser enters URL 2. CDN DNS resolves nearest PoP 3. Edge Server PoP — nearest region Cache HIT ✓ serve directly to user in cache? Cache MISS ✗ fetch from origin server Origin Server source of truth cache + TTL TTL — Time to Live Each cached item has an expiry duration. After TTL expires → re-fetch fresh content from origin.
Fig 3.1 — Full CDN Request Flow: DNS → PoP → Cache Hit/Miss → Origin

CDN routing decisions consider multiple parameters:

4.2 — DNS Caching

DNS (Domain Name System) translates human-readable domain names (like example.com) into IP addresses that browsers use to connect to servers. This resolution process, without caching, is deeply recursive and slow.

The DNS Resolution Chain (without cache)

DNS resolution chain User Device types URL Recursive Resolver ISP / Google / Cloudflare Root Server 13–14 globally TLD Server .com / .in / .org Authoritative Name Server has the IP ✓ Recursive Resolver goes deep until it finds the IP example.com → .com TLD → example.com auth server → IP
Fig 3.2 — DNS Resolution Chain: User → Resolver → Root → TLD → Authoritative Name Server

This entire recursive journey is expensive. DNS solves this with multiple levels of caching:

DNS cache hierarchy Each level is checked before proceeding to the next OS DNS Cache Windows/Mac/Linux Browser Cache Chrome, Firefox, etc. Recursive Resolver ISP / Google DNS / Cloudflare Full Resolution root → TLD → authoritative Level 1 Level 2 Level 3 Fallback
Fig 3.3 — DNS Cache Hierarchy: four levels before a full resolution is triggered

The Recursive Resolver is called "recursive" because it recursively queries different servers (root → TLD → authoritative) until it finds the answer. It is provided by your ISP or public DNS providers like Google (8.8.8.8) or Cloudflare (1.1.1.1).


05 · HARDWARE LEVEL

Hardware Level Caching

CPU Cache Hierarchy (L1, L2, L3)

The CPU doesn't read directly from RAM for every operation — that would be too slow. Instead, it maintains a hierarchy of smaller, faster memories called cache levels:

CPU to disk memory hierarchy HDD / SSD / Network RAM (main memory) L3 (shared) L2 L1 Cache Fastest · Smallest Slow Large Cheap
Fig 4.1 — CPU Memory Hierarchy: Nested levels from fastest/smallest (L1) to slowest/largest (HDD)

Why Arrays are Faster for Sequential Access

An important practical consequence of CPU caching: when you start traversing an array sequentially, the CPU's prefetch/predictive algorithms detect the sequential pattern and load the entire array (or a large chunk) into L1/L2 cache proactively. This is why for loops over arrays are extremely fast — the data is already in the cache by the time the CPU needs it.

RAM vs Disk — Why RAM is Faster (The Physics)

This is the fundamental reason why in-memory databases like Redis are so much faster than disk-based databases like PostgreSQL or MySQL. Understanding the actual hardware difference is important — it's not magic, it's physics and engineering.

How Hard Disk Storage Works

In a traditional hard disk drive (HDD), there is some kind of mechanical head which revolves around the disk. When you want to read data, the disk spins, the head moves to the right track, and it physically finds the data. It is a mechanical operation. Think about it — a literal physical arm has to move to a location on a spinning platter. This mechanical movement takes time — milliseconds, which in computing terms is an eternity.

How RAM Works

Random Access Memory is fundamentally different. It has a bunch of capacitors and transistors, and through the use of electrical signals combined with direct address-based access, it can access any location in memory with a single electrical signal. There is no physical movement, no seeking, no spinning. The data is accessed by sending an electrical signal to a specific memory address — and that happens at near the speed of electricity itself.

This is also why it's called Random Access Memory — it does not matter from what direction you try to access the data. The speed and the time is almost constant regardless of where in the memory the data sits. Whether you access address 0x0001 or address 0xFFFF, the time to retrieve is essentially the same. This property is called O(1) access time in data structures terminology.

Compare that to a hard disk: accessing data sequentially (from address 0 to 100) is fast because the head doesn't have to move much. But accessing data randomly (jumping from address 0 to 9000 to 200 to 7000) is extremely slow because the head has to physically seek to different locations each time. That's why HDDs prefer sequential access patterns.

Property RAM (Primary Storage) HDD (Secondary Storage) SSD (Secondary Storage)
Access mechanism Electrical signal → memory address Mechanical head seeks spinning platter Flash memory cells, electrical
Access time ~60–100 nanoseconds ~5–10 milliseconds ~50–150 microseconds
Relative speed ~100,000× faster than HDD Baseline ~100× faster than HDD
Random access O(1) — constant regardless Slow — mechanical seek required Good — no mechanical parts
Volatility Volatile — data lost on power off Persistent — survives power off Persistent — survives power off
Capacity (typical server) 64GB – 512GB Multiple TBs Multiple TBs
Cost per GB ~$5–10/GB ~$0.02–0.05/GB ~$0.10–0.20/GB

The Fundamental Tradeoff

When it comes to Random Access Memory, we are trading non-volatility and capacity for speed. That is the tradeoff stated plainly. RAM is incredibly fast but:

This is why you cannot completely replace a hard disk or traditional disk-based storage with RAM. They have their own role — they are fast when it comes to data access and retrieval, but they are not a replacement for secondary storage. Storing data in secondary storage is permanent — not volatile. It does not matter whether your program is accessing it or not, whether your computer is on or off — the data persists there because it is physically writing the data to the disk.

Primary vs Secondary Storage — The Summary

Primary storage (RAM) — Very fast data access, limited capacity, volatile (data lost on power off). Used for data that needs to be processed right now or retrieved instantly.

Secondary storage (HDD/SSD) — Slower data access, abundant capacity, non-volatile (data persists). Used for permanent storage of all your data.

How Redis Bridges Both Worlds

Technologies like Redis and Memcached make use of this Random Access Memory (primary memory / main memory) to store their data — that is why data access operations from these databases are very fast. But what about persistence? What about the fact that RAM loses data on power-off?

Behind the scenes, for persistence, these technologies also make use of the secondary storage. With some kind of mechanism, when the program starts, it takes the data from the secondary storage and loads it into main memory again — so that you have data persistence. But when you actually retrieve data or modify it, that happens with the primary memory. The in-memory database is responsible for implementing this persistence layer, whether it's Redis's RDB snapshots or its AOF (Append-Only File) log.

MDN Reference
MDN — Cache Glossary

06 · IN-MEMORY DATABASES

In-Memory Key-Value NoSQL Databases

Coming back to the context of backend development, technologies like Redis, Memcached, and if we're talking about Cloud technologies then AWS ElastiCache — these come into play. They provide some kind of storage, and that storage is based on the primary memory (RAM). That is the reason data access operations from these databases are very fast.

We call these technologies in-memory key-value NoSQL databases. That name has four parts, and each one tells you something important:

In-Memory

As compared to traditional databases like PostgreSQL or MySQL, these are not stored on disk. The storage is based on RAM (primary storage). That is the reason data access operations are extremely fast — we've just understood in the hardware section exactly why RAM is orders of magnitude faster than disk-based access. Redis reads and writes to RAM, not disk.

Key-Value

As compared to traditional relational databases which have very strict schema — you have to create tables, create rows, define columns with types, etc. — here the data structure is very simple. You have keys and values. You have a particular key, and for that key you can store anything — it can be a list, a JSON object, a string, a number, a hash, a set. Different technologies offer different data types, but the interface is always: give me a key, I'll give you its value.

NoSQL

They don't enforce the strictness of traditional SQL databases. No schemas, no joins, no complex queries. The API is intentionally simple. In Redis, you essentially have SET key value and GET key as your primary operations. It's not complex like SQL queries with aggregation, GROUP BY, etc. — it's pretty straightforward to access.

Database

Despite being "just" a key-value store, these are fully-fledged databases with features like TTL-based expiry, persistence options, pub/sub messaging, Lua scripting (Redis), clustering and replication. The "database" label is earned — they manage data reliably, not just as an ephemeral cache.

Why the Simplicity of Key-Value is a Feature, Not a Limitation

You might wonder: why would I use a database with no complex queries? The answer is that all that complexity you don't have is performance you get back. When Redis receives a GET command, it literally just looks up a hash table in memory. There's no query parsing, no query planning, no disk I/O, no index traversal. It's an O(1) memory lookup — that's why Redis can serve millions of operations per second.

This is what you as a backend engineer will deal with — you take whatever compatible library is available in your corresponding programming language (Node.js has node-redis or ioredis, Go has go-redis, Python has redis-py), and depending on that you just use the library. You provide a key, you provide a value, you store it. And when you want to retrieve it, you provide the key and you get the value. It is pretty straightforward — it has no complexities like SQL queries and aggregation. But all this technical familiarity with how the technology works behind the scenes, what are the major components, helps you make sense of the whole thing and make better decisions.

How Redis Handles Persistence

Redis uses two primary mechanisms to ensure data isn't lost when the process restarts:

The key insight: data is always read from and written to RAM during normal operation. Disk is only involved for persistence (saving state for recovery). The hot path — the path every user request takes — never touches disk.


07 · STRATEGIES

Caching Strategies

There are two primary caching strategies you'll encounter in day-to-day backend development. They answer different questions: when do you populate the cache? and when do you update it?

Strategy 1

Lazy Caching (Cache-Aside)

Cache is populated only when data is first requested. Proactive pre-filling is not done.

Strategy 2

Write-Through Caching

Every write to the database is simultaneously written to the cache. Cache is always fresh.

Strategy 1 — Lazy Caching (Cache-Aside)

Lazy caching flow diagram Client Server checks cache HIT → return instantly Cache ✓ MISS → fetch, store, return DB ✗ Store in cache result returned; cache populated for future requests
Fig 5.1 — Lazy (Cache-Aside) Caching: populate on first miss

Characteristics of Lazy Caching:

Strategy 2 — Write-Through Caching

Every time a write operation (POST, PUT, PATCH) changes data in the database, the same change is simultaneously applied to the cache within the same API call execution flow.

Advantage

Cache is always fresh. You never serve stale data because the cache is updated at the exact same time as the database.

Tradeoff

Every write operation carries additional overhead — you must update both the database and the cache atomically (or near-atomically). This increases latency of write operations. If write operations are very frequent, this can become a bottleneck.

When to use which?

ScenarioStrategy
Read-heavy, infrequent writes (product pages, profiles)Lazy caching + TTL
Needs always-fresh cache (financial data, inventory)Write-through
Unknown access patterns, gradual rolloutLazy caching (safer start)
Heavy write workload (logging, event streams)Avoid write-through

08 · EVICTION POLICIES

Eviction Policies

Something else we should be aware of when working with in-memory caches like Redis is the eviction policy. What does it mean? Let's understand the problem first.

Why Eviction is Necessary

When you have a cache — and as we already know, in-memory caches like Redis use primary storage (RAM), which is limited in capacity compared to secondary storage — it is pretty obvious that at one point you'll run out of memory. Whether you're running Redis on your own server and the RAM fills up, or you're using a managed service like AWS ElastiCache which has a storage limit — at some point you will hit the cap.

At that point, you have to decide: you want to store new data in the cache, but there's no room. You have to delete something old to make room for something new. And of course, as we've already discussed from the initial part of this topic, cache is only a subset of the data — the frequently accessed data stored in a different, faster location. The keyword to focus on: a subset of the primary storage. We cannot store all of the primary storage in the cache. So we have to decide what stays and what goes.

The Core Question of Eviction

Which piece of cached data is least valuable to keep? Evict that, make room for the new data which has higher priority. Different eviction policies answer this question differently.

LRU eviction step-by-step visualization LRU Example: Cache full (4 keys) — Key 5 arrives → evict oldest-accessed key Key 1 last access: today Key 2 last access: today Key 3 last access: today Key 4 last access: yesterday ← OLDEST ↑ EVICTED Key 5 new arrival needs space ↑ ENTERS Cache removes Key 4 (oldest), inserts Key 5 — total remains at 4 keys
Fig 6.1 — LRU Eviction: Key 4 (accessed yesterday, least recently used) evicted to make room for Key 5

09 · USE CASES

Redis Use Cases in Backend Development

Now that we have all the theoretical grounding, let's look at the concrete use cases where Redis and in-memory databases are used in a typical backend engineering workflow. These are the situations you'll encounter in real projects.

9.1 — Database Query Caching

One of the primary use cases. Let's say you have an SQL query that has a lot of JOINs — it tries to join multiple tables, does a lot of aggregation, and finally ends up with a few rows. It is a very compute-intensive operation because you have a large dataset (let's say millions and millions of rows), and you've noticed through monitoring that this particular API which calls this particular database query is hit pretty frequently — maybe it's your landing page or a dashboard page that a lot of users are hitting.

What happens to your database without caching? Every single user hitting that page triggers the same expensive multi-table JOIN. With a thousand concurrent users, you're making a thousand identical expensive queries to the database. This puts enormous load on the DB server and increases the API response latency for everyone.

What you do with caching: You take that particular query result, cache it with some TTL (say 1 hour), and from that point on — when the next request comes, you check if the result is present in the cache. If yes, serve it from there. Otherwise do the calculation once, store it in the cache, and return it. Whenever some modification happens to the underlying data, you can manually invalidate the cache or delete it, and the next request will recompute it fresh.

The Amazon Product Page Example

This is one of the best real-world illustrations. Imagine there is a sale going on for a MacBook on Amazon. If Amazon did not cache the details of that MacBook product, then during the sale period, millions of users will hit that particular web page. Fetching the MacBook's image, all the product descriptions, the specifications, the reviews — these are all database operations.

With millions of concurrent users on the sale, the database would get a million identical requests for the exact same data, and that puts significant load on the database for absolutely no reason — because the information like product details for a MacBook does not change very often. Product descriptions, images, specs — these are static data. They might change once a month, if that. That makes them an excellent candidate for caching.

So Amazon caches static data like product details and prices so that they can reduce the load on the database, and the database can actually do the important work — like handling checkout transactions, inventory updates, and order management — instead of spending all its capacity serving the same static product detail page over and over.

Social Media Profile Caching

Social media platforms like Twitter and Facebook also cache user profile data. Think about it — user profile data is not something that changes very often. Maybe a couple of times a year. That's the reason they cache user profile data so that every time that data is fetched it is served from the cache instead of from the database.

Now imagine if it is the social media profile of some celebrity. That particular page and that particular API for fetching the user profile details of that celebrity might get hit a thousand times per day normally — or if they have an upcoming movie, maybe a million times a day. In that case, putting all that load on the database makes absolutely no sense, since that user profile information is pretty static most of the time. It can serve that content from cache, and even if the user makes some change to their profile, you can invalidate the cache and put the new entry. This is a very read-heavy operation with very infrequent writes — the ideal scenario for caching.

The Read-Heavy Pattern

Whenever we have a read-heavy operation and the write is pretty infrequent, we can make use of caching. Database query caching is one of the primary examples of when we use technologies like Redis or in-memory databases. The pattern: reads are frequent, data rarely changes, computation is expensive → cache it.

9.2 — Session Token Storage

If you've watched the authentication video in this playlist, you might be aware of this. In a typical authentication flow, after a successful authentication, a session token is generated for that particular user and that session token is stored in some kind of storage.

Ideally, it is stored in Redis or an in-memory database — not in your main relational database. Here's why:

Every time the user makes a request or an API call to any endpoint on your server, you have to validate that session token — fetch the session information and check if it's valid. If you did not use Redis, you'd have to fetch that from your database for every single API call. And as you already know, fetching data from RAM (Redis) is much much faster than fetching data from a database.

Consider what happens at scale: if you have 100,000 concurrent active users, each making multiple API calls per minute, that's potentially millions of database queries per minute just for session validation — queries that return the exact same data (the session is valid, user ID is X) for the same session token again and again. This puts unnecessary load on your database and adds latency to every single API endpoint in your application.

With Redis: the session token is the key, the user's session data (user ID, permissions, metadata) is the value. Validation is a single GET session:token_id — an O(1) RAM lookup that takes nanoseconds. The session also naturally expires via TTL, so you don't need cleanup jobs.

9.3 — External API Response Caching

In your backend, you are making use of some external API — let's say some weather API — and you are taking the information from that and doing some kind of computation to serve your own frontend. Now every time your frontend makes a request to your API, if you do not make use of caching, you also make another request to the weather API to fetch the weather data.

If you have a lot of users and they are making multiple API calls, you end up making thousands of API calls to this external API. External APIs usually have:

In this case the weather data is not real-time data in the sense that it changes every second. Weather data does not change every second or minute — that's why it is a kind of data that is safe to cache. What you do: you fetch that information from the weather API, cache it in Redis with a TTL of 1 hour, and for the next 1 hour all the requests from your frontend will use the cached weather data. After an hour, the cache automatically invalidates, and the next time a request comes you'll fetch fresh weather data, put it back in the cache with a new TTL, and return it. For the following hour, all requests use that fresh cached version again.

When External API Caching is Appropriate

Ask yourself: how often does this data actually change in a meaningful way? Weather → hourly. Exchange rates → every few minutes. Stock prices → every second (too volatile to cache). News headlines → every hour. The answer determines your TTL. If data changes slower than your traffic rate, cache it.

9.4 — Rate Limiting

One last use case that comes to mind — since we are talking about rate limiting, the rate limiting mechanism is also implemented most of the times using a technology like Redis or any in-memory cache.

The way rate limiting is implemented: it is usually some kind of middleware which sits somewhere in the middle of the request pipeline — that's why it's called middleware. Before the request is passed to your route or controller, it goes through this rate limit middleware first.

How Rate Limiting Middleware Works Step by Step

The middleware takes a header from the incoming request — some kind of header which gives it the IP address of the user. Usually the header is something like X-Forwarded-For. This header is mostly used for implementing rate limiting to find out the public IP address of the client wherever the request is coming from. This is usually added by a reverse proxy like Nginx or whatever you are using.

The job of this middleware is:

  1. Extract the X-Forwarded-For header from the incoming request to get the client's IP address
  2. Check Redis for a counter associated with that IP address for the current time window (say, per minute)
  3. Increment that counter by 1
  4. If the counter exceeds the configured limit (say, 50 requests per minute), block the request and return HTTP status 429 Too Many Requests
  5. If under the limit, pass the request through to the actual route handler

Let's say the condition is: a particular client can only make 50 requests in 1 minute. Then whenever a request comes:

The TTL on the key is set to 1 minute. After 1 minute, the key automatically expires, the counter resets, and the client can make requests again. This is a clean, automatic window with zero cleanup code needed.

Rate limiting middleware flow with counter walkthrough Client IP: 10.0.0.1 Rate Limit Middleware reads X-Forwarded-For checks + incr counter Redis Counter "rate:10.0.0.1:<minute>" → 37 TTL: 60s (auto-resets each minute) count ≤ 50 → PASS count > 50 → 429 HTTP 429: Too Many Requests
Fig 7.1 — Rate Limiting: middleware reads IP → increments Redis counter → blocks at 51st request

Why Redis and Not a Relational Database for Rate Limiting

This is a fair question. You could store the counter in PostgreSQL or MySQL — it has persistent storage and we can retrieve data. That is possible. But the difference is: taking data out of a relational database takes more time. Even a difference of 20 or 30 milliseconds makes a significant impact on API latency — because rate limiting runs on every single request.

If we stored it in a relational database then for each request we'll be making a database call. In turn, first, the latency will be increased for that particular API since we are making a database call unnecessarily for each request. Second, the load on our database also increases — let's say there are a thousand users making 100 requests per minute. That's 100,000 database queries per minute just for rate limiting counter increments. Your database will be flooded with just the overhead of the rate limiting layer.

That is the reason we want to separate this out — for two reasons. First, to make it as fast as possible so that we can minimise the latency of APIs. Second, to decrease the database load. That is the reason whenever we are talking about implementing rate limiting, we make use of in-memory databases like Redis instead of storing it in our relational databases. Redis's atomic INCR command is particularly well-suited: it increments a key by 1 in a single atomic operation — no race conditions, no need for transactions or locks.


10 · CODE

Code Examples

Below are practical implementations of caching patterns using Go and Python — the two languages referenced in this course.

10.1 — Lazy (Cache-Aside) Caching in Go

cache_aside.go · Go + github.com/redis/go-redis/v9
package main

import (
    "context"
    "encoding/json"
    "fmt"
    "time"

    "github.com/redis/go-redis/v9"
)

type Product struct {
    ID    string `json:"id"`
    Name  string `json:"name"`
    Price float64 `json:"price"`
}

var rdb = redis.NewClient(&redis.Options{
    Addr: "localhost:6379",
})

// GetProduct implements Cache-Aside (Lazy) caching.
// 1. Check Redis first
// 2. On miss, fetch from DB, store in cache, return
func GetProduct(ctx context.Context, productID string) (*Product, error) {
    cacheKey := "product:" + productID

    // Step 1: Try cache first (Cache Hit path)
    cached, err := rdb.Get(ctx, cacheKey).Result()
    if err == nil {
        var product Product
        json.Unmarshal([]byte(cached), &product)
        fmt.Println("[CACHE HIT]", productID)
        return &product, nil
    }

    // Step 2: Cache Miss — fetch from database (expensive operation)
    fmt.Println("[CACHE MISS] fetching from DB...", productID)
    product, err := fetchFromDatabase(productID) // simulate DB call
    if err != nil {
        return nil, err
    }

    // Step 3: Store in cache with a 1-hour TTL
    data, _ := json.Marshal(product)
    rdb.Set(ctx, cacheKey, data, 1*time.Hour)

    return product, nil
}

// Write-Through: update DB and cache simultaneously
func UpdateProduct(ctx context.Context, product *Product) error {
    // Step 1: Update in database
    if err := updateInDatabase(product); err != nil {
        return err
    }

    // Step 2: Write-through — update cache immediately
    cacheKey := "product:" + product.ID
    data, _ := json.Marshal(product)
    rdb.Set(ctx, cacheKey, data, 1*time.Hour)

    fmt.Println("[WRITE-THROUGH] DB + cache updated for", product.ID)
    return nil
}

10.2 — Rate Limiting Middleware in Go

rate_limiter.go · Go + go-redis
package middleware

import (
    "context"
    "fmt"
    "net/http"
    "time"

    "github.com/redis/go-redis/v9"
)

const (
    maxRequests = 50
    windowTime  = 1 * time.Minute
)

func RateLimitMiddleware(rdb *redis.Client) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        ctx := context.Background()

        // Extract client IP from X-Forwarded-For header
        // (set by reverse proxy like Nginx or Caddy)
        clientIP := r.Header.Get("X-Forwarded-For")
        if clientIP == "" {
            clientIP = r.RemoteAddr
        }

        // Redis key: per IP, per minute window
        key := fmt.Sprintf("rate_limit:%s:%d", clientIP, time.Now().Unix()/60)

        // INCR is atomic — no race condition even with concurrent requests
        count, err := rdb.Incr(ctx, key).Result()
        if err != nil {
            http.Error(w, "Internal Server Error", http.StatusInternalServerError)
            return
        }

        // Set TTL on first request of this window (key is new)
        if count == 1 {
            rdb.Expire(ctx, key, windowTime)
        }

        // Check if limit exceeded
        if count > maxRequests {
            w.Header().Set("Retry-After", "60")
            http.Error(w, "429 Too Many Requests", http.StatusTooManyRequests)
            return
        }

        // Proceed to actual handler
        w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", maxRequests-count))
    }
}

10.3 — Caching Decorator in Python

cache.py · Python + redis-py + FastAPI
import json
import functools
import redis
from fastapi import FastAPI

app = FastAPI()

# Connect to Redis
r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def cache_result(ttl: int = 3600):
    """
    Decorator that implements lazy (cache-aside) caching.
    ttl: time-to-live in seconds (default 1 hour)
    """
    def decorator(func):
        @functools.wraps(func)
        async def wrapper(*args, **kwargs):
            # Build cache key from function name + args
            cache_key = f"{func.__name__}:{args}:{kwargs}"

            # Step 1: Check cache (Cache Hit)
            cached = r.get(cache_key)
            if cached:
                print(f"[CACHE HIT] {cache_key}")
                return json.loads(cached)

            # Step 2: Cache Miss — execute the actual function
            print(f"[CACHE MISS] calling {func.__name__}...")
            result = await func(*args, **kwargs)

            # Step 3: Store in Redis with TTL
            r.setex(cache_key, ttl, json.dumps(result))

            return result
        return wrapper
    return decorator


# Usage: apply cache decorator to any route handler
@app.get("/products/{product_id}")
@cache_result(ttl=3600)  # cache for 1 hour
async def get_product(product_id: str):
    # Expensive DB query — only runs on cache miss
    product = await fetch_from_db(product_id)
    return product


# TTL-based API Response Caching (e.g. weather)
@app.get("/weather/{city}")
async def get_weather(city: str):
    cache_key = f"weather:{city}"

    # Check cache (TTL = 1 hour, weather doesn't change every minute)
    cached = r.get(cache_key)
    if cached:
        return {"source": "cache", "data": json.loads(cached)}

    # Miss: call external weather API (costs money / rate limited)
    weather_data = await call_weather_api(city)

    # Store for 1 hour
    r.setex(cache_key, 3600, json.dumps(weather_data))

    return {"source": "api", "data": weather_data}

10.4 — Session Management with Redis (Python)

session.py · Python + redis-py
import uuid
import json
import redis
from datetime import timedelta

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def create_session(user_id: str, user_data: dict) -> str:
    """Create a session after successful login. Store in Redis."""
    session_id = str(uuid.uuid4())
    session_key = f"session:{session_id}"

    # Store session data with 24-hour TTL
    # TTL ensures sessions auto-expire — no manual cleanup needed
    r.setex(
        session_key,
        timedelta(hours=24),
        json.dumps({"user_id": user_id, **user_data})
    )

    return session_id  # returned to client as cookie/token


def get_session(session_id: str) -> dict | None:
    """
    Validate session on every authenticated API request.
    Redis O(1) lookup — microseconds, not milliseconds.
    """
    session_key = f"session:{session_id}"
    data = r.get(session_key)

    if not data:
        return None  # session expired or invalid

    # Optionally: refresh TTL on activity (sliding window)
    r.expire(session_key, timedelta(hours=24))

    return json.loads(data)


def delete_session(session_id: str):
    """Logout — delete session from Redis immediately."""
    r.delete(f"session:{session_id}")

REFERENCES

Further Reading & Documentation

Redis Official Docs

MDN Web Docs

Go Redis Client

Python Redis Client

Cloudflare & CDN

BACKEND ENGINEERING FIELD MANUAL · V2 · CHAPTER 12 · CACHING
Notes compiled from lecture transcript · Go + Python examples · MDN & Redis references inline